Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Moongleam Date: Dec 30, 2011 8:07am
Forum: feature_films Subject: Re: PD?

In most cases the text was correct in the printed book, and the error was introduced by the OCR process.

One cannot expect OCR to be perfect. If you examine printing in a paper book with a powerful magnifying glass, you'll see that the letters aren't perfectly formed. No two e's will be exactly the same. (There could have been too much or too little ink on the lead type; the paper could have imperfections, etc.) Furthermore, the OCR program doesn't know what typeface was used in the book, so it doesn't know the exact shape of the characters.

It is amazing that OCR works as well as it does (the algorithms used must be very sophisticated), but a literate human is much better at deciphering letters on paper.

This post was modified by Moongleam on 2011-12-30 16:07:31